HIV Integrase 3' processing inhibitors SAR

Project

Home
HIV

用于预测HIV-1整合酶链切割（3’processing）抑制剂的支持向量机模型

Support vector machine (SVM) models for predicting inhibitors of the 3’ processing step of HIV-1 integrase

Xuan, S.Y.; Wang, M.L.; Kang H.; Kirchmair, J.; Tan, L.; Yan, A.X.*

Molecular Informatics, 2013, 32(9-10), 811-826.

抑制HIV-1整合酶的链切割过程(3'P)是艾滋病治疗中最有前途的策略之一。使用支持向量机(SVM) 算法，我们构建了6个分类模型来预测3'P抑制剂生物活性。这些模型基于1253个抑制剂分子数据集和经过筛选的48个分子描述符构建，实验报道的IC50活性值范围从纳摩尔级到微摩尔级。SVM模型Model B2表现最好，其对测试集的预测精度、敏感性、特异性和Matthews相关系数(MCC) 分别为93%、81%、94%和0.67。氢键形成能力和亲水性的存在通常是影响抑制剂生物活性的关键因素。其他重要因素包括分子折射性、π原子电荷、总电荷、孤对电负性和有效原子极化性。通过对高活性抑制剂和弱活性抑制剂的结构比较分析证实了以上观察结果，并揭示了3'P抑制剂的几个特征结构元素。

阅读文章原文

下载原始数据

Download Supporting Information

Inhibition of the 3’ processing step of HIV-1 integrase by small molecule inhibitors is one of the most promising strategies for the treatment of AIDS. Using a support vector machine (SVM) approach, we developed six classification models for predicting 3’P inhibitors. The models are based on up to 48 selected molecular descriptors and a comprehensive data set of 1253 molecules, with measured activities ranging from nanomolar to micromolar IC50 values. Model B2, the most robust SVM model, obtains a prediction accuracy, sensitivity, specificity and Matthews correlation coefficient (MCC) of 93 %, 81 %, 94 % and 0.67 on the test set, respectively. The presence of hydrogen bonding features and hydrophilicity in general were identified as key determinants of inhibitory activity. Further important properties include molecular refractivity, π atom charge, total charge, lone pair electronegativity, and effective atom polarizability. Comparative fragment-based analysis of the active and inactive molecules corroborated these observations and revealed several characteristic structural elements of 3’P inhibitors. The models built in this study can be obtained from the authors.

Model Name	Algorithm	Descriptors	Spliting methods	Training set numbers	Training set accuracy (%)	Training set Cross-validation 5-fold accuracy (%)	Training set Cross-validation 10-fold accuracy (%)	Training set Cross-validation LOO accuracy (%)	Test set numbers	Test set SE	Test set SP	Test set accuracy (%)	Test set MCC
Model A1	SVM	41 MOE	Random	493	95.94	82.76	82.56	83.57	760	69.33	88.91	86.97	0.4641
Model A2	SVM	41 MOE	Kohonen’s self-organizing map (SOM)	537	92.92	79.70	79.33	79.70	716	68.83	93.58	90.92	0.5726
Model B1	SVM	41 MOE + 7 RDF	Random	493	99.39	84.38	83.37	85.40	760	69.33	90.07	88.03	0.4859
Model B2	SVM	41 MOE + 7 RDF	Kohonen’s self-organizing map (SOM)	537	98.32	79.70	79.89	81.56	716	80.52	94.21	92.74	0.6707
Model C1	SVM	MACCS	Random	493	96.35	81.74	83.77	84.18	760	38.51	96.32	86.05	0.4465
Model C2	SVM	MACCS	Kohonen’s self-organizing map (SOM)	537	92.92	81.01	81.38	80.45	716	51.92	96.24	89.80	0.5478

主要项目成员

宣首逸

博士研究生

王茂林